Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add missing pmie webhook action configuration functionality #181

Closed

Conversation

richm
Copy link
Collaborator

@richm richm commented Dec 4, 2023

Resolves Red Hat issue RHEL-13760

@richm richm requested a review from natoscott as a code owner December 4, 2023 21:30
@richm
Copy link
Collaborator Author

richm commented Dec 4, 2023

[citest]

Copy link
Collaborator

@natoscott natoscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richm thanks for fixing this up while I was away. Not sure what I was thinking with -e/-E there .. my brain must've been in awk mode. :-)

@richm richm force-pushed the webhook-event-notification branch from abf7a48 to 635c35e Compare December 5, 2023 00:52
@richm
Copy link
Collaborator Author

richm commented Dec 5, 2023

[citest]

@richm
Copy link
Collaborator Author

richm commented Dec 5, 2023

@natoscott The latest sync of code from ansible-pcp brings tests/tests_verify_pmie_webhook.yml and check_pmie_webhook.yml which seems very similar to the new proposed test tests_verify_notification.yml and check_notification.yml - is there some overlap? Should we get rid of one or the other?

@natoscott
Copy link
Collaborator

@natoscott The latest sync of code from ansible-pcp brings tests/tests_verify_pmie_webhook.yml and check_pmie_webhook.yml which seems very similar to the new proposed test tests_verify_notification.yml and check_notification.yml - is there some overlap? Should we get rid of one or the other?

I think we need both as ansible-pcp is usable without the metrics role, but our QE folk will want to verify with the metrics role as well.

@richm
Copy link
Collaborator Author

richm commented Dec 5, 2023

[citest]

@richm
Copy link
Collaborator Author

richm commented Dec 5, 2023

[citest]

@richm
Copy link
Collaborator Author

richm commented Dec 5, 2023

@natoscott this is looking better - but still have the following problems

  • cannot restart pmlogger on centos7
  • problems with redis on fedora
  • problems with grafana on rhel 9

Can you take a look at these? They are unrelated to this PR, probably caused by something in that last sync with ansible-pcp

@natoscott
Copy link
Collaborator

Can you take a look at these? They are unrelated to this PR, probably caused by something in that last sync with ansible-pcp

Sure thing, thanks for the heads-up @richm - will let you know what I find.

@natoscott
Copy link
Collaborator

@richm OK, have discovered a few things. There are three failures causing all CI issues:

  1. on CentOS-7 there's a strange pmlogger service failure - this is not a known issue, nothing has changed in el7 PCP for a long time - I wonder if this might be selinux or firewall result (external factors). Can we extract the journalctl output from the command indicated in the failure message, from a CI system?
  2. on Fedora-37 and Fedora-38 the new pmie-webhook test is failing with "Could not find the requested service redis: host" - Redis isn't used here though. So, I think this is actually passing, its a test failure that seems to be related to the way we interrogate service state. It might be fixed by the patch below, WDYT?
  3. all RHEL-9.4 based test cases are failing with the grafana-server core you observed. This is not a known issue and I'm chasing it down further with QE and Grafana devs in the team ... still WIP. There was a recent change to add selinux policy to Grafana in RHEL, so I'm wondering if this could be involved. Will get back to you tomorrow hopefully on this one.
tests/restore_services_state.yml
--- /tmp/git-blob-AYZrfz/restore_services_state.yml     2023-12-11 14:04:12.002996635 +1100
+++ tests/restore_services_state.yml    2023-12-11 13:57:08.143243811 +1100
@@ -16,6 +16,7 @@
   when:
     - item + '.service' in final_state.ansible_facts.services
     - item + '.service' in initial_state.ansible_facts.services
+    - initial_state.ansible_facts.services[item + '.service']['status'] != 'not-found'
   with_items:
     - pmcd
     - pmlogger

@natoscott
Copy link
Collaborator

@richm have confirmed with our Grafana gurus that an SE Linux AVC has been observed to cause a Grafana core dump just recently. There is also a fixed grafana-selinux policy build recently submitted in 9.4 that may resolve this issue - if not, its quite likely to be a new selinux issue we've not seen before.

@richm
Copy link
Collaborator Author

richm commented Dec 12, 2023

closing in favor of #183

@richm richm closed this Dec 12, 2023
@richm richm deleted the webhook-event-notification branch December 12, 2023 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants